Power Iteration Clustering
نویسندگان
چکیده
We present a simple and scalable graph clustering method called power iteration clustering (PIC). PIC finds a very low-dimensional embedding of a dataset using truncated power iteration on a normalized pair-wise similarity matrix of the data. This embedding turns out to be an effective cluster indicator, consistently outperforming widely used spectral methods such as NCut on real datasets. PIC is very fast on large datasets, running over 1,000 times faster than an NCut implementation based on the state-of-the-art IRAM eigenvector computation technique.
منابع مشابه
Parallelized Power Iteration Clustering of Nouns and Verbs using Subject-Verb and Verb-Object Pairs
We explore the use of Power Iteration Clustering for large-scale clustering of nouns and verbs, using subject-verb-object triple relations. We have implemented a parallelized version of PIC, which can efficiently handle clustering of hundreds of thousands of sparse “documents.” We tested our implementation on clustering of over 1 million noun phrases, and over 650K verb phrases. We also evaluat...
متن کاملParallel Power Iteration Clustering for Big Data using MapReduce in Hadoop
In today’s life Distributed Data Mining is most popular topic in research area because as data are increasing in day to day life there are so many problems occurs to handle them and there are also a solutions for that but still they are not as per expectation, still there are some issue already there in the Distributed Data Mining, among them mainly we are focus in this papers that about reduci...
متن کاملClient Based Power Iteration Clustering Algorithm to Reduce Dimensionality in Big Data
Clustering is a group of objects that are similar among themselves but dissimilar to objects in other clusters. Clustering large dataset is a challenging task and the need for increase in scalability and performance formulates it to use parallelism. Though the use of Big Data has become very essential, analyzing it is demanding. This paper presents the (pC-PIC) parallel Client based Power Itera...
متن کاملGPIC - GPU Power Iteration Cluster
This work presents a new clustering algorithm, the GPIC, a Graphics Processing Unit (GPU) accelerated algorithm for Power Iteration Clustering (PIC). Our algorithm is based on the original PIC proposal, adapted to take advantage of the GPU architecture, maintining the algorith original properties. The proposed method was compared against the serial and parallel Spark implementation, achieving a...
متن کاملAn Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering
Here, an algorithm is presented for solving the minimum sum-of-squares clustering problems using their difference of convex representations. The proposed algorithm is based on an incremental approach and applies the well known DC algorithm at each iteration. The proposed algorithm is tested and compared with other clustering algorithms using large real world data sets.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010